Webpage Intelligent Parsing Algorithm Based on Text and Symbol Density
نویسندگان
چکیده
Web page intelligent parsing is an inevitable part of data collection. News web pages contain a lot information with little relevance to the topic, which makes it difficult locate text content directly and quickly during collection process. This paper proposes algorithm based on symbol density. Through empirical research mainstream news websites in China, can accurately extract pages.
منابع مشابه
Intelligent and Robust Genetic Algorithm Based Classifier
The concepts of robust classification and intelligently controlling the search process of genetic algorithm (GA) are introduced and integrated with a conventional genetic classifier for development of a new version of it, which is called Intelligent and Robust GA-classifier (IRGA-classifier). It can efficiently approximate the decision hyperplanes in the feature space. It is shown experime...
متن کاملAN IMPROVED INTELLIGENT ALGORITHM BASED ON THE GROUP SEARCH ALGORITHM AND THE ARTIFICIAL FISH SWARM ALGORITHM
This article introduces two swarm intelligent algorithms, a group search optimizer (GSO) and an artificial fish swarm algorithm (AFSA). A single intelligent algorithm always has both merits in its specific formulation and deficiencies due to its inherent limitations. Therefore, we propose a mixture of these algorithms to create a new hybrid optimization algorithm known as the group search-artif...
متن کاملText Classification Based on Deep Textual Parsing
The problem of classifying text based on the deep parsing structure is addressed. An algorithm for document classification tasks where counts of words or n-grams is insufficient is proposed. The parse tree kernel method at the level of paragraphs, based on anaphora, rhetoric structure relations and communicative actions linking phrases in the parse thicket is considered.
متن کاملAn Algorithm For Open Text Semantic Parsing
This paper describes an algorithm for open text shallow semantic parsing. The algorithm relies on a frame dataset (FrameNet) and a semantic network (WordNet), to identify semantic relations between words in open text, as well as shallow semantic features associated with concepts in the text. Parsing semantic structures allows semantic units and constituents to be accessed and processed in a mor...
متن کاملA Webpage Classification Algorithm Concerning Webpage Design Characteristics
Owing to the booming growth of Internet technology, the number of web documents has significantly increased over the Internet. If the webpage can be effectively managed, the knowledge demanders (i.e., Internet users) can efficiently absorb and use the knowledge documents; it has become the core topic in this information explosion era. Webpage classification technology with high accuracy can imp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Academic journal of computing & information science
سال: 2022
ISSN: ['2616-5775']
DOI: https://doi.org/10.25236/ajcis.2022.050403